Understanding demographics is crucial for gaining insights into the composition, dynamics, and trends of human populations. Demographic factors encompass a wide range of characteristics, including population size, birth rates, economic features, land distribution, and linguistic diversity. These factors play a fundamental role in shaping societies, economies, and environments around the world. Demographic data provides insights into the statistical characteristics of human populations, such as age, gender, race, ethnicity, income, education, and employment status. It involves the study and analysis of population size, structure, distribution, and trends over time, offering valuable information about social, economic, and environmental patterns and trends.
In this study, the focus will be on five main areas: population, economic features, land distribution, social aspects, and linguistic diversity. As of 2023, the world had 195 countries (Science Focus, 2023), reflecting the breadth and diversity of the global landscape. For practical purposes, the study will have a reference to five continents: Americas, Asia, Africa, Europe and Oceania.
Reference: Science Focus. (2023). How many countries are there? Retrieved from https://www.sciencefocus.com/planet-earth/how-many-countries-are-there
The objective of this study is to explore the interplay of demographics, land distribution, economics, and social indicators worldwide. Leveraging a comprehensive dataset spanning 195 countries as of 2023, the goal is to unearth valuable insights into the complex relationships among these variables.
Population and Land Distribution: The study aims to explore how population and land are distributed across different regions of the world, shedding light on patterns of human settlement and resource allocation.
Relationships Between Demographic Variables: By examining demographic variables such as population size and land distribution, the study seeks to uncover relationships and dependencies between these factors, providing a deeper understanding of demographic dynamics.
Interactions Between Demographic Factors and Economic Indicators: The analysis delves into how demographic factors interact with economic indicators, offering insights into the socio-economic landscape of different continents and countries.
Correlations Between Population, Urbanization Rates, and Labor Force Participation: By analyzing correlations between population, urbanization rates, and labor force participation, the study aims to understand urban development patterns and labor market dynamics.
Influence of Demographic Patterns on Language Distribution and Cultural Diversity: The study investigates how demographic patterns, such as population size and distribution, influence language distribution and cultural diversity across countries and continents.
Exploration of Agricultural and Forested Areas: The analysis includes an exploration of the portions of agricultural and forested areas per continent, providing insights into land use patterns and environmental sustainability.
Application of Central Limit Theorem Principles: Using the Life Expectancy variable from the dataset, the study applies principles of the Central Limit Theorem to understand how sample means behave and reflect the distribution of the population.
Employment of Various Sampling Techniques: The study employs various sampling techniques to explore how different sampling methods reflect the behavior of a population, offering insights into the reliability and validity of sample-based conclusions.
The dataset analyzed in this study was taken from Kaggle: https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023. This dataset contains demographic, economics, health, education, social and associated information about the 195 countries in the world as of July 2023.
The dataset underwent an initial inspection in Excel, revealing certain variables with non-ideal formats for R analysis. Consequently, additional preprocessing steps were deemed necessary within R, as detailed in the subsequent sections of this analysis. The specific operations conducted in Excel are outlined below:
All the code in R is provided below. A brief summary of the changes include,
library(plotly)
library(tidyverse)
library(sampling)
if (!is.element("countrycode", installed.packages()[,"Package"]))
install.packages("countrycode", repos="http://cran.us.r-project.org",
dependencies = TRUE)
if (!is.element("scales", installed.packages()[,"Package"]))
install.packages("scales", repos="http://cran.us.r-project.org",
dependencies = TRUE)
library(countrycode)
library(scales)
#Read file and preprocessing
df <- read.csv("world-data-2023.csv")
#Var definition############################################################33
options(scipen = 999)
bu_id <- 4963
#Create data frame to store population and sample means + sd
central_lim_the <- data.frame(
sample_size <- 0,
mean <- 0.0,
sd <- 0.0
)
#Proper conversion of numeric values
fields_to_convert <- c("CPI.Change....", "CPI", "Agricultural.Land....",
"Land.Area.Km2.", "Density..P.Km2.", "Forested.Area....",
"Gross.primary.education.enrollment....", "Gross.tertiary.education.enrollment....",
"Forested.Area....", "GDP", "Life.expectancy", "Population",
"Population..Labor.force.participation....", "Urban_population",
"Unemployment.rate")
df <- df |>
mutate_at(fields_to_convert, ~ as.numeric(str_replace_all(., "[,$%]", ""), na.rm = TRUE))
##Add continent data
df$Continent <- countrycode(df$Country, origin = 'country.name', destination = 'continent')
#Add missing rows for life expectancy so sampling has all numeric value available
df$Life.expectancy[is.na(df$Life.expectancy)] <- round(mean(df$Life.expectancy, na.rm = TRUE),2) #For now, replace NA with average value
#Add GDP per capita
df$GDP_per_capita <- df$GDP / df$Population
world_data <- as_tibble(df)
#Agricultural
world_data$Agricultural_km <- world_data$Land.Area.Km2. * world_data$Agricultural.Land..../100
world_data$Forested_km <- world_data$Land.Area.Km2. * world_data$Forested.Area..../100
#Labor force participation - getting the population number vs the percentage
world_data$Labor_force_participation <- world_data$Population * world_data$Population..Labor.force.participation..../100Understanding demographics is crucial for gaining insights into the composition, dynamics, and trends of human populations. Demographic factors encompass a wide range of characteristics, including population size, age distribution, birth rates, land distribution, and linguistic diversity. These factors play a fundamental role in shaping societies, economies, and environments around the world. In this introduction, the importance of demographics will be explored, and the significance of understanding factors such as population distribution by continent and land distribution by continent will be discussed.
This analysis offers a comprehensive examination of global population distribution, considering variables such as country and continent. Through various data wrangling techniques, including aggregation, summarization, and sorting, insights are derived from the dataset comprising 195 countries. Grouping data by country and continent provides a holistic perspective, while spotlighting the top N countries for specific variables sheds light on noteworthy trends.
Below, distribution of population across countries and continents is analyzed, unraveling patterns and disparities in global demographics.
population_by_continent <-
world_data |>
group_by(Continent) |>
summarise(
Population = sum(Population))
population_by_continent$Latitude <- NA
population_by_continent$Longitude <- NA
geo <- list(
scope = 'world',
projection = list(type = 'natural earth'),
showland = TRUE,
landcolor = toRGB("gray95"),
subunitcolor = toRGB("gray85"),
countrycolor = toRGB("gray85"),
countrywidth = 0.5,
subunitwidth = 0.5
)
world_data |>
plot_geo(lat = ~Latitude, lon = ~Longitude) |>
add_markers(text = ~paste(Country, paste("Total:", comma(Population)), sep = "<br />"),
color = ~Population, symbol = I("square"), size = ~Population, hoverinfo = "text"
) |>
colorbar(title = "Distribution") |>
layout(title = 'World Population Distribution<br />(Hover for country)',
geo = geo
) |>
add_annotations(
text = ~paste("Global Population: ",
comma(sum(world_data$Population, na.rm = T))),
x = 1.2,
y = 0,
font = list(color = "blue"),
xref = "paper",
yref = "paper",
showarrow = FALSE)Incorporating insights about the top 20 most populated countries offers valuable perspectives on global demographics and trends. Understanding population distribution is vital across various disciplines, including economics, social sciences, and geography. It facilitates comprehension of human population distribution across continents and the factors driving population growth and density. Discussing the top 20 most populated countries enables highlighting significant demographic trends, economic implications, and social challenges at a global level.
#############################################################
#Top N most populated countries
Population_order <-
world_data |>
select(Country, Population, Continent, Life.expectancy, Unemployment.rate) |>
arrange(desc(Population))
Population_top <- Population_order[1:20, ]
Population_top |>
plot_ly(x = ~Population, y = ~Country, color = ~Continent) |>
layout(title = "Top 20 most populated countries with a reference to Continent",
yaxis = list(categoryorder='total ascending', standoff = 50),
margin = list(l = 10) # Adjust the left margin as needed
)Understanding the distribution of land and population across continents is crucial for informed decision-making and sustainable development. Asia’s higher population density compared to America despite its smaller landmass can possibly be attributed to historical settlement patterns, economic opportunities, cultural factors, and geographic features. Analyzing this distribution could help manage resources, plan socio-economic development, mitigate environmental impact, and make informed economic decisions.
land_by_continent <-
world_data |>
group_by(Continent) |>
summarise(
sum_land = sum(Land.Area.Km2., na.rm = TRUE),
sum_agri = sum(Agricultural_km, na.rm = TRUE),
forested_land = sum(Forested_km, na.rm = TRUE)
)
#Pie group land_by_continent vs population_by_continent
fig <- plot_ly()
fig <- fig %>% add_pie(data = land_by_continent, labels = ~Continent, values = ~sum_land,
name = "Land", domain = list(row = 1, column = 0),
texttemplate="%{label}<br>(%{percent})",
textposition="inside",
title = "Land distribution")
fig <- fig %>% add_pie(data = population_by_continent, labels = ~Continent, values = ~Population,
name = "Population", domain = list(row = 1, column = 1),
texttemplate="%{label}<br>(%{percent})",
textposition="inside",
title = "Population distribution")
fig <- fig %>% layout(title = "Land distribution vs Population distribution", showlegend = T,
grid=list(rows=1, columns=2),
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
figAnother interesting variable to examine is life expectancy, representing the average number of years a person is expected to live. This metric is aggregated by continent, offering insights into population health and longevity trends. Exploring life expectancy distributions across continents provides valuable perspectives on regional disparities and healthcare outcomes, highlighting potential socio-economic and environmental factors influencing lifespan variations.
The analysis highlights significant variations in life expectancy across continents, with Europe exhibiting notably higher life expectancies compared to other regions. Conversely, Africa emerges with the lowest life expectancies, potentially influenced by economic disparities, limited access to healthcare services, prevalent diseases, and socio-political challenges. While the study refrains from delving into detailed analyses of these factors, it offers valuable insights into regional life expectancy disparities, warranting further investigation into their underlying determinants.
Life expectancy data at a global scale reveals that approximately 50% of the world’s population is observed to live around 73 years, with a maximum global age of 85.4 years according to the maximum age recorded in the dataset.
#Life Expectancy by contienent
plot_ly(world_data, x = ~Life.expectancy, color = ~Continent, type="box",
boxpoints = "all") |>
layout(title = "Life Expectancy Per Continent + Global",
yaxis = list(title = 'Continent', standoff = 35)
) |>
add_trace(x = ~world_data$Life.expectancy, name = "Global", color = "yellow") The histogram of life expectancy across the world appears to exhibit a left-skewed distribution, as evidenced by the longer tail on the left side of the histogram compared to the right side. The mean life expectancy of 72 years and a standard deviation of 7 indicate that the majority of countries have life expectancies clustered around this average, with less common values on the lower end contributing to the skewness.
#LIfe Expectancy Distribution
world_data |>
plot_ly(x = ~Life.expectancy,
type = "histogram",
xbins = list(start = 50, size =5)
) |>
layout(xaxis = list(title = "Life Expectancy (Years)"),
yaxis = list(title = "Frequency"),
title = "Life Expectancy Distribution",
bargap = 0.05
) ## Mean: 72.27969 , Sd: 7.327734
By examining urban population rates and labor force participation, insights into the economic development of countries emerge. Higher rates of labor force participation typically indicate stronger economic growth. Interestingly, continents with the highest urban population percentages, such as Europe and America, suggest a greater degree of urbanization, with around 75% of their populations residing in urban areas. In contrast, Asia and Africa have lower urban population percentages, around 50%, indicating a higher proportion of rural areas. This disparity in urbanization rates may reflect varying levels of economic development and infrastructure across continents.
#Get urban pop and labor force partipation per continent
pop_by_continent <-
world_data |>
group_by(Continent) |>
summarise(
sum_pop = sum(Population, na.rm = TRUE),
sum_urban = sum(Urban_population, na.rm = TRUE),
sum_labor = sum(Labor_force_participation, na.rm = TRUE)
)
pop_by_continent |>
plot_ly(y = ~Continent) |>
add_trace(x = ~sum_pop, name = "Total Population",
text = "") |>
add_trace(x = ~sum_urban, name = "Urban Population Percentage",
text = ~paste0(round(sum_urban/sum_pop*100, 2), "%")) |>
add_trace(x = ~sum_labor, name = "Labor Force Participation Percentage",
text = ~paste0(round(sum_labor/sum_pop*100, 2), "%")) |>
layout(title = "Population vs Urban vs Labor Force distribution",
xaxis = list(title = "Population size"),
yaxis = list(title = "Continent", standoff = "45"),
barmode = "group")The analysis of GDP (Gross Domestic Product) provides valuable insights into the economic landscape of each country. By examining this metric, a deeper understanding of economic dynamics at the country level can be gained. The plot below illustrates the top 20 countries with the highest GDP and their respective continents, offering a comprehensive overview of economic trends and highlighting the interconnections of countries across different regions. All the continents except Africa have a country that is part of the top 20 with the highest GDP.
GDP_top <-
world_data |>
select(Country, GDP, Continent, Life.expectancy, Unemployment.rate,
Population, GDP_per_capita) |>
arrange(desc(GDP)) |>
head(20)
GDP_top <- GDP_top[1:20, ]
#Top 20 contries with the highest GDP by country"
GDP_top |>
plot_ly(x = ~Country, y = ~GDP, color = ~Continent) |>
layout(title = "Top 20 countries with the highest GDP",
xaxis = list(categoryorder='total descending'),
yaxis = list(title = "GDP (US dollars)"),
barmode = 'stack')An alternative approach to examining GDP distribution is through a different data visualization format. The plot below provides a clear depiction of the contribution of each continent to GDP. By presenting the data in this format, we can easily identify which continents contribute most significantly to global GDP. The primary focus of this analysis is on understanding GDP distribution across continents.
#Top 20 contries with the highest GDP by continent - a different vision"
GDP_top |>
plot_ly(x = ~Continent, y = ~GDP, color = ~Country) |>
layout(title = "Top 20 countries with the highest GDP by continent <br> with country included",
xaxis = list(categoryorder='total descending'),
barmode = 'stack')When analyzing the connection between GDP per capita and life expectancy among the top 20 countries with the highest GDP, a clear pattern emerges: countries with a higher GDP per capita tend to exhibit improved life expectancy rates. This trend suggests that greater economic prosperity is often linked to better healthcare systems, improved access to medical services, and enhanced overall well-being for the population.
#GDP and life expectancy
GDP_top |>
plot_ly(
x = ~GDP_per_capita,
y = ~Life.expectancy,
color = ~Country,
hoverinfo = "text",
type = 'scatter',
size = ~Life.expectancy,
mode = 'markers',
text = ~paste('Country:', Country, '<br>Life Expectancy:', Life.expectancy,
'<br>GDP Per Capita:$', comma(GDP_per_capita),
'<br>Population:', comma(Population))
) %>%
layout(
title = "Life Expectancty across the top 20 GDP countries",
xaxis = list(
type = "log",
title = "GDP Per Capita(US dollars)"
),
yaxis = list(title = "Life Expectancy (Years)"))Exploring the relationship between GDP and unemployment offers valuable insights into the economic dynamics of different countries. While it’s commonly assumed that higher GDP levels correspond to lower unemployment rates, real-world observations often challenge this assumption. For instance, Japan, despite ranking third in terms of GDP, exhibits an exceptionally low unemployment rate of just 3.86%. Conversely, the United States, with the highest GDP ranking, experiences the highest unemployment rate at around 15%.
These examples highlight the complexity of the relationship between GDP and unemployment. Several factors, such as labor market dynamics, government policies, and demographic trends, can influence unemployment rates independently of GDP levels. Therefore, analyzing both GDP and unemployment rates together provides a more comprehensive understanding of a country’s economic landscape and labor market conditions.
#GDP vs Unemployment rate
GDP_top |>
plot_ly(
x = ~Unemployment.rate,
y = ~Continent,
color = ~Country,
hoverinfo = ~Country,
type = 'scatter',
mode = 'markers',
text = ~paste('Country:', Country,
'<br>GDP Per Capita:$', comma(GDP_per_capita),
'<br>Unemployment Rate:', Unemployment.rate, "(%)"),
size = ~Unemployment.rate
) %>%
layout(
title = "Unemployment rate across the top 20 GDP countries by continent",
xaxis = list(title = "Unemployment Rate (%)",
type = "log"
),
yaxis = list(title = "GDP Per Capita (US dollars)")
)Birth rates, defined as the number of births per 1,000 population per year, exhibit a noticeable correlation with GDP per capita, revealing intriguing patterns. Generally, countries with higher GDP per capita tend to have lower birth rates, suggesting a potential association with increased investment in education and healthcare infrastructure. This trend underscores the importance of socioeconomic development in shaping demographic trends, as higher levels of prosperity often coincide with greater access to family planning resources and educational opportunities.
This correlation between GDP per capita and birth rates offers valuable insights into the complex interplay between economic development and demographic trends, highlighting the multifaceted nature of population dynamics worldwide.
world_data |>
plot_ly(x = ~GDP_per_capita, y = ~Birth.Rate, mode = 'markers',
color = ~Continent,
text = ~paste("Country: ", Country, "<br>",
"Birth Rate: ", Birth.Rate, "%")) |>
layout(title = 'GDP vs Birth Rate',
xaxis = list(title = 'GDP Per Capita (US dollars)'),
yaxis = list(title = 'Birth Rate (%)'))Exploring the relationship between total land area, agricultural proportion, and forested areas provides valuable insights into environmental sustainability and land management practices across diverse regions. Analysis reveals that the Americas, comprising North America and South America, boast the largest land extension globally, followed by Asia and the rest of the continents.
world_data$Agricultural_km <- world_data$Land.Area.Km2. * world_data$Agricultural.Land..../100
world_data$Forested_km <- world_data$Land.Area.Km2. * world_data$Forested.Area..../100
land_by_continent <-
world_data |>
group_by(Continent) |>
summarise(
sum_land = sum(Land.Area.Km2., na.rm = TRUE),
sum_agri = sum(Agricultural_km, na.rm = TRUE),
forested_land = sum(Forested_km, na.rm = TRUE)
)
land_by_continent |>
plot_ly(x = ~Continent) |>
add_trace(y = ~sum_land, name = "Total Land Area") |>
add_trace(y = ~sum_agri, name = "Agricultural Area") |>
add_trace(y = ~forested_land, name = "Forested Area") |>
layout(title = "Land distribution across continents",
xaxis = list(title = "Continent"),
yaxis = list(title = "KM2"),
barmode = "group")The Central Limit Theorem (CLT) states that regardless of the shape of the population distribution, the distribution of sample means approaches a normal distribution as the sample size increases. This means that when we take multiple samples from a population and calculate the mean of each sample, the distribution of these sample means becomes more normal as the sample size grows larger. As a result, even if the population distribution is not normal, we can rely on the normal distribution of sample means when making statistical inferences. This fundamental concept allows for the application of various statistical methods, such as hypothesis testing and confidence interval estimation, with confidence, emphasizing the robustness of the CLT in statistical analysis.
Life Expectancy variable will be used to explore CLT. The first thing to do is to analyze the distribution of this variable.
df |>
plot_ly(x = ~Life.expectancy,
type = "histogram"
) |>
layout(xaxis = list(title = "Life Expectancy (Age)"),
yaxis = list(title = "Frequency")
)central_lim_the <- central_lim_the |> slice(0)
central_lim_the["Population", ] <- c(nrow(df),
mean(df$Life.expectancy, na.rm = TRUE),
sd(df$Life.expectancy, na.rm = TRUE))The mean life expectancy of 72 years and a standard deviation of 7 indicate that the majority of countries have life expectancies clustered around this average, with less common values on the lower end contributing to the skewness. The presence of peaks at 72, 76, and 80 years old suggests that these are common life expectancy values among the sampled countries, while the dips at 74 and 78-79 years old may indicate less common life expectancies. This distribution pattern could imply that while a significant number of countries have life expectancies around the mean, there are also disparities in life expectancy values, possibly due to various socio-economic, healthcare, and environmental factors influencing longevity. Further analysis and investigation would be necessary to understand the underlying reasons for these distribution characteristics.
To prove the statements of CLT, 4 groups of 1000 samples of size 10, 20, 30 and 40 will be taken as shown below,
#Various samples
sam <- function(data, samples, sample.size, seed, replace_flag){
set.seed(seed)
xbar <- numeric(samples)
for (i in 1:samples){
xbar[i] <- mean(sample(data, size=sample.size, replace=replace_flag))
}
return (xbar)
}
data <- df$Life.expectancy
sample.sizes <- c(10, 20, 30, 40)
plots <- list()
j <- 0
samples <- 1000
for (i in sample.sizes){
j <- j + 1
xbar <- sam(data, samples, sample.size=i, seed=bu_id, replace_flag=FALSE)
xbar <- as.data.frame(xbar)
xbar
plots[[j]] <- plot_ly(xbar, x=~xbar, type = "histogram", histnorm = "probability",
name = paste("Sample ", i))
central_lim_the[paste("Sample ", i),] <- c(sample.size=i,
mean=mean(xbar$xbar, na.rm = TRUE),
sd=sd(xbar$xbar, na.rm = TRUE))
}
#Convert it to for loop
subplot(
plots,
nrows = 2,
shareY = TRUE
) |>
layout(title = 'World life expectancy various samples sizes')In this example, we observe that the mean of the sample means closely approximates the population mean, regardless of the sample size. However, as the sample size increases, we notice a reduction in the standard deviation of the sample means (standard error). This reduction in standard deviation indicates that the variability of sample means around the population mean decreases with larger sample sizes. In other words, as we take larger samples, the spread or dispersion of sample means becomes smaller, resulting in a more precise estimate of the population mean. This phenomenon is consistent with the behavior predicted by the Central Limit Theorem (CLT), which underscores the robustness of the theorem in describing the behavior of sample means with varying sample sizes. Below are the means and standard deviations of each group,
## sample_size....0 mean....0 sd....0
## Population 195 72.27969 7.327734
## Sample 10 10 72.43285 2.188349
## Sample 20 20 72.33062 1.530444
## Sample 30 30 72.32752 1.273337
## Sample 40 40 72.33646 1.078424
Various sampling methods are commonly used in statistics to select representative samples from populations. Simple random sampling (SRS) without replacement ensures each member of the population has an equal chance of selection, while systematic sampling involves selecting every kth element from a list after a random start. Stratified sampling divides the population into homogeneous subgroups (strata) and selects samples from each stratum. Systematic sampling with unequal probabilities assigns different probabilities of selection to elements based on certain criteria. Despite their differences, all sampling methods aim to select unbiased samples that accurately represent the population. The choice of method depends on factors such as population characteristics and research objectives.
central_lim_the <- central_lim_the |> slice(0)
central_lim_the["Population", ] <- c(nrow(df), mean(df$Life.expectancy),sd(df$Life.expectancy))
plot.1 <- df |>
plot_ly(x = ~Life.expectancy,
type = "histogram", histnorm = "probability",
name = "Population"
) |>
layout(xaxis = list(title = "Life Expectancy - Age"),
yaxis = list(title = "")
)
sample.size <- 50
#Simple Random Sampling with Replacement
set.seed(bu_id)
s <- srswor(sample.size, nrow(df))
rows <- (1:nrow(df))[s!=0]
df_srswr <- df[rows,]
central_lim_the["Srswor", ] <- c(nrow(df_srswr), mean(df_srswr$Life.expectancy),
sd(df_srswr$Life.expectancy))
sample.1 <- plot_ly(df_srswr, x=~Life.expectancy, type = "histogram", histnorm = "probability",
name = "SRSWOR") |>
layout(xaxis = list(title = "Life Expectancy - Age"),
yaxis = list(title = ""))
#Systematic sampling with unequal probabilities
set.seed(bu_id)
pik <- inclusionprobabilities(df$Life.expectancy, sample.size)
s <- UPsystematic(pik)
df_syssam <- df[s != 0, ]
central_lim_the["Systematic Sampling", ] <- c(nrow(df_syssam), mean(df_syssam$Life.expectancy),
sd(df_syssam$Life.expectancy))
sample.2 <- plot_ly(df_syssam, x=~Life.expectancy, type = "histogram", histnorm = "probability",
name = "Systematic") |>
layout(xaxis = list(title = "Life Expectancy - Age"),
yaxis = list(title = ""))
###Stratified sampling
set.seed(bu_id)
order.index <- order(df$Continent)
data <- df[order.index, ]
freq <- table(data$Continent)
sizes <- round(sample.size * freq / sum(freq))
st <- sampling::strata(data, stratanames = c("Continent"),
size = sizes, method = "srswor")
df_strat <- sampling::getdata(data, st)
central_lim_the["Stratified Sampling", ] <- c(nrow(df_strat), mean(df_strat$Life.expectancy),
sd(df_strat$Life.expectancy))
sample.3 <- plot_ly(df_strat, x=~Life.expectancy, type = "histogram", histnorm = "probability",
name = "Stratified") |>
layout(xaxis = list(title = "Life Expectancy - Age"),
yaxis = list(title = ""))
subplot(
plot.1,
sample.1,
sample.2,
sample.3,
nrows = 4,
shareY = TRUE
) |>
layout(title = 'World life expectancy various sampling methods',
margin = list(l = 60, b = 0.07),
titleY = FALSE, titleX = TRUE)Different sampling methods introduce varying levels of bias and variability into the sample selection process, leading to differences in means and standard deviations observed in samples taken from the same population. For example, systematic sampling may introduce bias if there is a periodic pattern in the population list, while stratified sampling aims to reduce bias by ensuring adequate representation of subgroups. Variability in sample means can also vary depending on the sampling method used, with systematic sampling with unequal probabilities potentially resulting in higher variability compared to simple random sampling. Therefore, careful consideration of population characteristics and research objectives is essential when choosing the most appropriate sampling method to obtain representative and unbiased samples.
The means for the different sampling methods is shown below,
## sample_size....0 mean....0 sd....0
## Population 195 72.27969 7.327734
## Srswor 50 73.03720 6.697513
## Systematic Sampling 50 73.61480 6.153755
## Stratified Sampling 50 71.46520 8.256384
The analysis undertaken in this study primarily focused on examining three key aspects: economics, demographics, and land distribution, with a supplementary investigation into language distribution. Through this exploration, several intriguing relationships and insights emerged:
Population distribution across continents reveals both similarities and disproportions when mapped against various variables.
Land distribution inequality: The examination of land distribution across continents highlighted significant disparities, with certain regions exhibiting disproportionate access to land resources compared to others. This inequality raises important questions about land management practices and resource utilization.
Economic inequality: The analysis of GDP distribution revealed substantial inequality, with a notable concentration of economic wealth in specific countries. This concentration implies challenges related to resource allocation, economic development, and global economic equity.
GDP per capita vs unemployment rate: Contrary to common assumptions, the analysis revealed that GDP and unemployment rates are not necessarily directly related in all cases. This finding underscores the complexity of economic dynamics and suggests that other factors may influence employment levels independently of GDP.
Language distribution: While not extensively explored, the analysis hinted at the diverse linguistic landscape across continents, underscoring the importance of language diversity and its implications for cultural exchange and communication.
Overall, this analysis provides valuable insights into the multifaceted nature of global dynamics, emphasizing the significance of diversity and the prevalence of inequality across various socio-economic and environmental dimensions. It underscores the importance of addressing these disparities to foster more equitable and sustainable development pathways globally.
Social-Cultural
Languages across the globe
Language distribution data offer insights into cultural diversity, linguistic diversity, and communication patterns across countries and continents. Understanding language distribution is essential for promoting cultural exchange, facilitating international communication, and developing effective marketing strategies, education programs, and language policies that respect linguistic diversity and promote inclusivity.
To take an even closer approach to language distribution and offer insights into which languages might be advantageous to learn, let’s examine the distribution of languages on a country-by-country basis. Specifically, the focus will be on countries where the official language aligns with one of the most widely spoken languages globally. This approach will provide a closer look at how languages are distributed across different regions and offer valuable information for individuals considering language acquisition.
Language distribution by number of countries where that language is the official
One final consideration regarding language distribution is to examine the prevalence of specific official languages across countries. While Chinese is the most spoken language in terms of the number of speakers, this does not necessarily correlate with the number of countries where it is spoken as an official language. This observation is intriguing as it highlights the diverse linguistic landscapes found around the world, where certain languages may have a significant number of speakers but are only officially recognized in a limited number of countries. This underscores the complexity of language distribution and its relationship to cultural, historical, and political factors within each country.
Top 20 languages by number of countries